Skip to main content

Full text of "mit :: ai :: aim :: AIM-894"

See other formats


MASSACHUSETTS INSTITUTE OF TECHNOLOGY 
ARTIFICIAL INTELLIGENCE LABORATORY 



A.I. Memo No. 894 April, 1986 

Computational Complexity of Current GPSG Theory 

Eric Sven Ristad 



ABSTRACT: 

An important goal of computational linguistics has been to use linguistic theory 
to guide the construction of computationally efficient real-word natural language 
processing systems. At first glance, the entirely new generalized phrase struc- 
ture grammar (GPSG) theory of Gazdar, Klein, Pullum, and Sag (1985) appears 
to be a blessing on two counts. First, their precise formal system and the broad 
empirical coverage of their published English grammar might be a direct guide 
for a transparent parser design and implementation. Second, since GPSG has 
weak context-free generative power and context-free languages can be parsed 
in 0(n 3 ) by a wide range of algorithms, GPSG parsers would appear to run 
in polynomial time. This widely-assumed GPSG "efficient parsbility" result is 
misleading: here we prove that the universal recognition problem for the new 
GPSG theory is exponential-polynomial time hard, and assuredly intractable. 
The paper pinpoints sources of intractability (e.g. metarules and syntactic fea- 
tures) in the GPSG formal system and concludes with some linguistically and 
computationally motivated restrictions on GPSG. 



This report describes research done in part at the Artificial Intelligence Laboratory 
of the Massachusetts Institute of Technology. Support for the Laboratory's artificial 
intelligence research has been provided in part by the Advanced Research Projects 
Agency of the Department of Defense under Office of Naval Research contract N00014- 
80-C-0505. This paper will be presented at the 1986 ACL Conference in June. 

©Eric Sven Ristad, 1986 



1 INTRODUCTION 1 

1 Introduction 

An important goal of computational linguistics has been to use linguistic the- 
ory to guide the construction of computationally efficient real-world natural 
language processing systems. Generalized Phrase Structure Grammar (GPSG) 
linguistic theory holds out considerable promise as an aid in this task. The 
precise formalisms of GPSG offer the prospect of a direct and transparent guide 
for parser design and implementation. Furthermore, and more importantly, 
GPSG's weak context-free generative power suggests an efficiency advantage 
for GPSG-based parsers. Since context-free languages can be parsed in poly- 
nomial time, it seems plausible that GPSGs can also be parsed in polynomial 
time. This would in turn seem to provide "the beginnings of an explanation for 
the obvious, but largely ignored, fact that humans process the utterances they 
hear very rapidly (Gazdar,1981:155). Bl 

In this paper I argue that the expectations of the informal complexity argu- 
ment from weak context-free generative power are not in fact met. I begin by 
examining the computational complexity of metarules and the feature system 
of GPSG and show that these systems can lead to computational intractability. 
Next I prove that the universal recognition problem for current GPSG theory is 
Exp-Poly hard, and assuredly intractable. 2 That is, the problem of determining 
for an arbitrary GPSG G and input string x whether x is in the language L(G) 
generated by G, is exponential polynomial time hard. This result puts GPSG- 
Recognition in a complexity class occupied by few natural problems: GPSG- 
Recognition is harder than the traveling salesman problem, context-sensitive 
language recognition, or winning the game of Chess on an n x n board. The 
complexity classification shows that the fastest recognition algorithm for GPSGs 
must take exponential time or worse. One role of a computational analysis is to 
provide formal insights into linguistic theory. To this end, this paper pinpoints 
sources of complexity in the current GPSG theory and concludes with some 
linguistically and computationally motivated restrictions. 



2 Complexity of GPSG Components 

A generalized phrase structure grammar contains five language-particular com- 
ponents — immediate dominance (ID) rules, metarules, linear precedence (LP) 

'See also Joshi, "Tree Adjoining Grammars" p.226, in Natural Language Parting (1985) ed. 
by D. Dowty, L. Karttunen, and A. Zwicky, Cambridge University Press: Cambridge, and 
"Exceptions to the Rule," Science Newt 128: 314-315. 

2 We use the universal problem to more accurately explore the power of a grammatical 
formalism (see section 3.1 below for support). Ristad(1985) has previously proven that the 
universal recognition problem for the GPSG's of Gazdar(1981) is NP-hard and likely to be 
intractable, even under severe metarule restrictions. 



2 COMPLEXITY OF GPSG COMPONENTS 



statements, feature co-occurrence restrictions (FCRs), and feature specification 
defaults (FSDs) — and four universal components — a theory of syntactic fea- 
tures, principles of universal feature instantiation, principles of semantic inter- 
pretation, and formal relationships among various components of the grammar. 3 

Syntactic categories are partial functions from features to atomic feature 
values and syntactic categories. They encode subcategorization, agreement, 
unbounded dependency, and other significant syntactic information. The set K 
of syntactic categories is inductively specified by listing the set F of features, 
the set A of atomic feature values, the function p that defines the range of 
each atomic-valued feature, and a set R of restrictive predicates on categories 
(FCRs). 

The set of ID rules obtained by taking the finite closure of the metarules on 
the ID rules is mapped into local phrase structure trees, subject to principles 
of universal feature instantiation, FSDs, FCRs, and LP statements. Finally, 
local trees are assembled to form phrase structure trees, which are terminated 
by lexical elements. 

To identify sources of complexity in GPSG theory, we consider the isolated 
complexity of the finite metarule closure operation and the rule to tree map- 
ping, using the finite closure membership and category membership problems, 
respectively. Informally, the finite closure membership problem is to determine 
if an ID rule is in the finite closure of a set of metarules M on a set of ID rules 
R. The category membership problem is to determine if a category C or a legal 
extension of C is in the set K of all categories based on the function p and the 
sets A, F and R. Note that both problems must be solved by any GPSG-based 
parsing system when computing the ID rule to local tree mapping. 

The major results are that finite closure membership is NP-hard and cat- 
egory membership is PSPACE-hard. Barton(1985) has previously shown that 
the recognition problem for ID/LP grammars is NP-hard. The components of 
GPSG theory are computationally complex, as is the theory as a whole. 

Assumptions. In the following problem definitions, we allow syntactic cate- 
gories to be based on arbitrary sets of features and feature values. In actuality, 
GPSG syntactic categories are based on fixed sets and a fixed function p. As 
such, the set K of permissible categories is finite, and a large table containing 
K could, in principle, be given. 4 We generalize to arbitrary sets and an arbi- 
trary function p to prevent such a solution while preserving GPSG's theory of 



3 This work is based on current GPSG theory as presented in Gazdar et. al. (1985), 
hereafter GKPS. The reader is urged to consult that work for a formal presentation and 
thorough exposition of current GPSG theory. 

*This suggestion is of no practical significance, because the actual number of GPSG syntac- 
tic categories is extremely large. The total number of categories, given the 25 atomic features 
and 4 category-valued features, is: 



2 COMPLEXITY OF GPSG COMPONENTS 



syntactic features. 5 No other modifications to the theory are made. 

An ambiguity in GKPS is how the FCRs actually apply to embedded categories. 
Following Ivan Sag (personal communication), I make the natural assumption 
here that FCRs apply top-level and to embedded categories equally. 



2.1 Metarules 

The complete set of ID rules in a GPSG is the maximal set that can be arrived 
at by taking each metarule and applying it to the set of rules that have not 
themselves arisen as a result of the application of that metarule. This maximal 
set is called the finite closure (FC) of a set R of lexical ID rules under a set M 
of metarules. 

The cleanest possible complexity proof for metarule finite closure would fix 
the GPSG (with the exception of metarules) for a given problem, and then 
construct metarules dependent on the problem instance that is being reduced. 
Unfortunately, metarules cannot be cleanly removed from the GPSG system. 
Metarules take ID rules as input, and produce other ID rules as their output. 
If we were to separate metarules from their inputs and outputs, there would be 
nothing left to study. 

The best complexity proof for metarules, then, would fix the GPSG mod- 
ulo the metarules and their input. We ensure the input is not inadvertently 
performing some computation by requiring the one ID rule R allowed in the re- 
duction to be fully specified, with only one O-level category on the left-hand side 
and one unanalyzable terminal symbol on the right-hand side. Furthermore, no 
FCRs, FSDs, or principles of universal feature instantiation are allowed to ap- 
ply. These are exceedingly severe constraints. The ID rules generated by this 
formal system will be the finite closure of the lone ID rule R under the set M 



6 



\k = k*\ = s^i + s^Hti + s^Xfi + s^Xi + a 26 ) 1 ) 2 ) 3 ) 4 

= 3 25 (1 + 3 25 ) 64 > 3 162S > 10 775 

See page 10 for details. Many of these categories will be linguistically meaningless, but all 
GPSGs will generate all of them and then filter some out in consideration of FCRs, FSDs, 
universal feature instantiation, and the other admissible local trees and lexical entries in the 
GPSG. While the FCRs in some grammars may reduce the number of categories, FCRs are 
a language-particular component of the grammar. The vast number of categories cited above 
is inherent in the GPSG framework. 

8 Our goal is to identify sources of complexity in GPSG theory. The generalization to 
arbitrary sets allows a fine-grained study of one component of GPSG theory (the theory of 
syntactic features) with the tools of computational complexity theory. Similarly, the chess 
board is uncontroversially generalized to size nxnin order to study the computational 
complexity of chess. 

6 A category C that is defined for a feature /, f € (F - Atom) n DOM(C) (e.g. / = SLASH ), 
contains an embedded category C e , where C(f) = C«. GKPS does not explain whether FCR's 
must be true of C e as well as C. 



2 COMPLEXITY OF GPSG COMPONENTS 



of metarules. 

The (strict, resp.) finite closure membership problem for GPSG metarules 
is: Given an ID rule r and sets of metarules M and ID rules R, determine if 3r' 
such that r'^r (r' = r, resp.) and r' € FC(M,R). 

Theorem 1 Finite Closure Membership is NP-hard 

Proof: On input 3-CNF formula F of length n using the m variables xi . . . x m , 
reduce 3-SAT, a known NP-complete problem, to Metarule-Membership in poly- 
nomial time. 

The set of ID rules consists of the one ID rule R, whose mother category 
represents the formula variables and clauses, and a set of metarules M s.t. an 
extension of the ID rule A is in the finite closure of M over R iff F is satisfiable. 
The metarules generate possible truth assignments for the formula variables, and 
then compute the truth value of F in the context of those truth assignments. 

Let w be the string of formula literals in F, and let u>< denote the i th symbol 
in the string w. 

1. The ID rules R, A 

R: F -► <satisf iability> 

A : [[STAGE 3]] -♦ <satisf iable> 

where 

<satisf iable> is a terminal symbol 
<satisf iability> is a terminal symbol 
F = {[y< 0] : 1 < i < m} 

U {[« 0] : 1 < i < IfJ-} 
U {[STAGE 1]} 

2. Construct the metarules 

(a) m metarules to generate all possible assignments to the variables 

Vt, 1 < i < m 

{[t/< 0], [STAGE 1}}^W (1) 

v l ' 

{[ yi 1], [STAGE 1}}-+W 

(b) one metarule to stop the assignment generation process 



2 COMPLEXITY OF GPSG COMPONENTS 



{[STAGE 1|}->W 

* (2) 
{[STAGE 2]}-*W 

(c) | w | metarules to verify assignments 

Vt, j, k 1 < i < l|l, 1 < i < m, < A; < 2, 
if t(>3i_fc = xy, then construct the metarule 

{[W l|.[c* 0], [STAGE 2]}-+W 

* (3) 
{[yy l],[c< 1], [STAGE 2]} -» W 

Vi,y, Jfc 1 < i < IfJ-, 1 < j < m, < k < 2, 
if u)3,_fc = 5J, then construct the metarule 

{[Vi 0],[ci 0], [STAGE 2]}^W 

* (4) 
{[t/y 0],[ C< 1], [STAGE 2}}^W 

(d) Let the category C = {[c< 1] : 1 < i < ^-}. Construct the metarule 

C[STAGE 2] -f W 

* (5) 
{[STAGE 3]} -► <satisfiable> 

The reduction constructs 0(\ w |) metarules of size log(| w |), and clearly may 
be performed in polynomial time: the reduction time is essentially the number 
of symbols needed to write down the GPSG. Note that the strict finite closure 
membership problem is also NP-hard. One need only add a polynomial number 
of metarules to "change* the feature values of the mother node C to some 
canonical value when C(STAGE ) = 3 — all 0, for example, with the exception 
of STAGE . Let F = {[y< 0] : 1 < i < m} U {[c f 0] : 1 < t < ^}. Then A 
would be 

A : f[STAGE 3] -»• <satisf iable> 

q.s.p 

The major source of intractability is the finite closure operation itself. In- 
formally, each metarule can more than double the number of ID rules, hence by 
chaining metarules (i.e. by applying the output of a metarule to the input of the 
next metarule) finite closure can increase the number of ID rules exponentially. 7 

7 More precisely, the metarule finite closure operation can increase the size of a GPSG G 



2 COMPLEXITY OF GPSG COMPONENTS 



2.2 A Theory of Syntactic Features 

Here we show that the complex feature system employed by GPSG leads to 
computational intractability. The underlying insight for the following complex- 
ity proof is the almost direct equivalence between Alternating Turing Machines 
(ATMs) and syntactic categories in GPSG. The nodes of an ATM computa- 
tion correspond to 0-level syntactic categories, and the ATM computation tree 
corresponds to a full, n-level syntactic category. The finite feature closure re- 
striction on categories, which limits the depth of category nesting, will limit 
the depth of the corresponding ATM computation tree. Finite feature closure 
constrains us to specifying (at most) a polynomially deep, polynomially branch- 
ing tree in polynomial time. This is exactly equivalent to a polynomial time 
ATM computation, and by Chandra and Stockmeyer(1976), also equivalent to 
a deterministic polynomial space-bounded Turing Machine computation. 

As a consequence of the above insight, one would expect the GPSG Category- 
Membership problem to be PSPACE-hard. The actual proof is considerably 
simpler when framed as a reduction from the Quantified Boolean Formula (QBF) 
problem, a known PSPACE-complete problem. 

Definition: 

The QBF problem is defined as follows. "Quantified Boolean formulas (QBF) 
are built from variables, the operators V, A, and ->, parentheses, and the quan- 
tifiers 3 ('there exists') and V ('for all'). When defining QBF's recursively, we 
find it useful simultaneously to define free occurrences of variables (occurrences 
to which no quantifier applies), and the scope of a quantifier (those occurrences 
to which the quantifier applies). 

1. If z is a variable, then it is a QBF. The occurrence of z is free. 

2. If Ei and E 2 are QBF's, so are --(^i), (#i) V (E 2 ), and (^ x ) A (E 2 ). An 
occurrence of z is free or bound, depending on whether the occurrence is 
free or bound in E± or E 2 . Redundant parentheses can be omitted. 

3. IF E is a QBF, then 3x{E) and Vx{E) are QBF's. The scopes of 3x and 
Vz are all free occurrences of z in E. (Note that there may also be bound 
occurrences of z in E; theses are not part of the scope.) Free occurrences 
of z in E are bound in 3x(E) and Vx(E). All other occurrences of variables 



worse than exponentially: from | G | to 0(\ G \ 2 &). Given a set of ID rules R of symbol size n, 
and a set M of m metarule, each of size p, the symbol size of FC(M,R) is 0(n 2nt ) = 0(\G I 2 ' '). 
Each metarule can match the productions in R 0(n) different ways, inducing 0(n+ p) new 
symbols per match: each metarule can therefore square the ID rule grammar size. There are 
m metarules, so finite closure can create an ID rule grammar with 0(n 2m ) symbols. 



2 COMPLEXITY OF GPSG COMPONENTS 



in E are free or bound, depending on whether they are free or bound in 
E. 

"A QBF with no free variables has a value of either true or false, which 
we denote by the Boolean constants 1 and 0. The value of such a QBF is 
determined by replacing each subexpression of the form 3x (E) by Eq A E\ and 
each subexpression of the form Wx(E) by EoV Ei, where Eq and Ei are E 
with all occurrences of x in the scope of the quantifier replaced by and 1, 
respectively (Hopcraft and Ullman,1979:343-344).'' 

The QBF problem is {QiyiQ2y2---Qmy m F{yi,y 2 , ■■■,y m ) I Qi € {V,3}, 
where the %n are boolean variables, F is a boolean formula of length n in con- 
junctive normal form with exactly three variables per clause (3-CNF), and the 
quantified formula is true}. 

Let a specification of K be the arbitrary sets of features F, atomic features 
Atom, atomic feature values A, and feature co-occurrence restrictions R and let 
p be an arbitrary function, all equivalent to those defined in chapter 2 of GKPS. 
The category membership problem is: Given a category C and a specification of 
a set K of syntactic categories, determine if 3C" s.t. C'3C and C" e K. 

Theorem 2 GPSG Category-Membership is PSPACE-hard 

Proof: By reduction from QBF. On input formula 

ft = QiyiQ.2y2 ■ ■ ■ Q m y m F(yi, y 2 , • • • , y m ) 

we construct an instance P of the Category-Membership problem in poly- 
nomial time, such that ft € QBF if and only if P is true. 

Consider the QBF as a strictly balanced binary tree, where the i th quantifier 
Qi represents pairs of subtrees < T t , Tj > such that (1) T t and 2/ each immedi- 
ately dominate pairs of subtrees representing the quantifiers Qi+i . ..Qmi and 
(2) the i th variable yf is true in T t and false in T/. All nodes at level i in the 
whole tree correspond to the quantifier Qi. The leaves of the tree are differ- 
ent instantiations of the formula F, corresponding to the quantifier-determined 
truth assignments to the m variables. A leaf node is labeled true if the instan- 
tiated formula F that it represents is true. An internal node in the tree at level 
i is labeled true if 

1. Qi = "3* and either daughter is labeled true, or 

2. Qi = "V s and both daughters are labeled true. 



2 COMPLEXITY OF GPSG COMPONENTS 



Otherwise, the node is labeled false. 

Similarly, categories can be understood as trees, where the features in the 
domain of a category constitute a node in the tree, and a category C immediately 
dominates all categories C" such that 3/ e {(F - Atom) f~l D0M(C))[C(/) = C"]. 

In the QBF reduction, the atomic-valued features are used to represent the m 
variables, the clauses of F, the quantifier the category represents, and the truth 
label of the category. The category-valued features represent the quantifiers — 
two category- valued features q k , q' k represent the subtree pairs < Tt,Tf > for 
the quantifier Q k . FCRs maintain quantifier-imposed variable truth assignments 
"down the tree" and calculate the truth labeling of all leaves, according to F, 
and internal nodes, according to quantifier meaning. 

Details. Let w be the string of formula literals in F, and to,- denote the i th 
symbol in the string w. We specify a set K of permissible categories based on 
A, F, p, and the set of FCRs R s.t. the category [[LABEL 1]] or an extension of 
it is an element of K iff fl is true. 

First we define the set of possible O-level categories, which encode the formula 
F and truth assignments to the formula variables. The feature «;,- represents 
the formula literal u>,- in w, y } - represents the variable t/y in f2, and c; represents 
the truth value of the i th clause in F. 

Atom = {LEVEL , LABEL } 

U {v>i : 1 < t <|u»|} 

U {yy : 1 < j < m} 

U {a : 1 < i < IfL} 
F - Atom = {q k , q' k : 1 < k < m} 
p°(LEVEL) = {fc:l<fc<m+l} 
p°(f) = {0,1} V/ € Atom - {LEVEL } 

FCR's are included to constrain both the form and content of the guesses: 

1. FCR's to create strictly balanced binary trees: 

VA;, 1 < k < m , 

[LEVEL k] = [q k [\y k l][LEVEL fc+l]]]&[ 9 ; [[y k 0][LEVEL A:+l]]] 

2. FCR's to ensure all 0-level categories are fully specified: 



2 COMPLEXITY OF GPSG COMPONENTS 



Vt, 1 < t < f , 

[d] = [ty 3 j_ 2 ]&[tW3t-l]&[«'3.-] 

[LABEL ] = [ci] 

VJb, 1 < k < m , 

[LABEL ] = [ft] 

3. FCR's to label internal nodes with truth values determined by quantifier 
meaning: 

VA:, 1 < k < m , 

if Q k = T, then include: 
[LEVEL fc]&[LABEL 1] = [q k [[LABEL l}}]k[q' k [[LABEL l]]] 
[LEVEL Jfc]&[LABEL 0] = [q k [[LABEL 0]]] V [q' k [[LABEL 0]]] 

otherwise Q k = "3", and include: 
[LEVEL fc]&[LABEL 1] = [q k [[LABEL 1]]] V [q' k [[LABEL 1]]] 

[LEVEL ifc]&[LABEL 0] = \q k [[LABEL 0]}]&[q k [[LABEL 0]]] 

The category- valued features q k and q' k represent the quantifier Q k . In 
the category value of q k , the formula variable y k = 1 everywhere, while in 
the category value of q' k , y k = everywhere. 

4. one FCR to guarantee that only satisfiable assignments are permitted: 

[LEVEL 1] D [LABEL l] 

5. FCR's to ensure that quantifier assignments are preserved "down the tree" : 

Vi, k 1 <i < k < m, 

[m i] d [ft [[» i]]]&[ 9 ; [[ift i]]] 
[yi o] d [ft [[» o]]]&[«; [[» o]]] 

6. FCR's to instantiate variable assignments into the formula F: 

Vi, A: 1 < i <\w\ and 1 < k < m , 
if to,- = t/fc, then include: 

[y k 1] D [wi 1] 

[y* o] d K o] 

else if W{ = ft, then include: 
[ft 1] D [v>i 0] 
[y fc 0] D [«;< 1] 



3 COMPLEXITY OF GPSG-RECOGNITION 10 



7. FCR's to verify the guessed variable assignments in leaf nodes: 

V» 1 < t < £ , 

[ci 0] = [u»3i_ 3 0]&[wai_i 0]&[u> 3 < 0] 

[c,- 1] = [u»8i-a l]v[tBw_i 1]v[wjw 1] 
[LEVEL m + l]&[c< 0] D [LABEL 0] 

[LEVEL m+l]&[c x l]&[c 2 1]& ...&[c w/3 l] D [LABEL l] 

The reduction constructs 0(|u;|) features and 0{m 2 ) FCRs of size O(logm) 
in a simple manner, and consequently may be seen to be polynomial time. 

Q.e.p 

The primary source of intractability in the theory of syntactic features is 
the large number of possible syntactic categories (arising from finite feature 
closure) in combination with the computational power of feature co-occurrence 
restrictions. 8 FCRs of the "disjunctive consequence" form [/ v] z> [/i vi] V 
• • • v [fn v n ] compute the direct analogue of Satisfiability: when used in con- 
junction with other FCRs, the GPSG effectively must try all n feature-value 
combinations. 



3 Complexity of GPSG-Recognition 

Two isolated membership problems for GPSG's component formal devices were 
considered above in an attempt to isolate sources of complexity in GPSG the- 
ory. In this section the recognition problem (RP) for GPSG theory as a whole 
is considered. I begin by arguing that the linguistically and computationally 
relevant recognition problem is the universal recognition problem, as opposed 
to the fixed language recognition problem. I then show that the former problem 
is exponential-polynomial (Exp-Poly) time-hard. 



8 Finite feature closure admits a surprisingly large number of possible categories. Given a 
specification (F, Atom, A, R, p) of K, let a =| Atom| and 6 = | F — Atom|. Assume that all atomic 
features are binary: a feature may be +,-, or undefined and there are 3* 0-level categories. 
The b category-valued features may each assume 0(Z a ) possible values in a 1-level category, 
so 1X^1= 0(3 a (3 tt )*). More generally, 

\K = K h \= O(3 ( °^i=o fl^TT*) = O(3 (oi, ^i=o &) = 0(3 at!e ) = 0(3 al! ) 

where J3 1=0 jf converges to e « 2.7 very rapidly and a, b = 0{\ G\); a = 25, b — 4 in GKPS. 
The smallest category in K will be 1 symbol (null set), and the largest, maximally-specified, 
category will be of symbol-size log \K\= 0(a • b\). 



3 COMPLEXITY OF GPSG-RECOGNITION 11 



3.1 Defining the Recognition Problem 

The universal recognition problem is: given a grammar G and input string x, 
is x € £((7)?. Alternately, the recognition problem for a class of grammars 
may be denned as the family of questions in one unkown. This fixed language 
recognition problem is: given an input string x, is x € L for some fixed language 
LI. For the fixed language RP, it does not matter which grammar is chosen to 
generate L — typically, the fastest grammar is picked. 

It seems reasonable clear that the universal RP is of greater linguistic and 
engineering interest than the fixed language RP. The grammars licensed by 
linguistic theory assign structural descriptions to utterances, which are used to 
query and update databases, be interpreted semantically, translated into other 
human languages, and so on. The universal recognition problem — unlike the 
fixed language problem — determines membership with respect to a grammar, 
and therefore more accurately models the parsing problem, which must use a 
grammar to assign structural descriptions. 

The universal RP also bears most directly on issues of natural language 
acquisition. The language learner evidently possesses a mechanism for selecting 
grammmars from the class of learnable natural language grammars £,g on the 
basis of linguistic inputs. The more fundamental question for linguistic theory, 
then, is "what is the recognition complexity of the class £g?"- If this problem 
should prove computationally intractable, then the (potential) tractability of 
the problem for each language generated by a G in the class is only a partial 
answer to the linguistic questions raised. 

Finally, complexity considerations favor the universal RP. The goal of a 
complexity analysis is to characterize the amount of computational resources 
(e.g. time, space) needed to solve the problem in terms of all computation- 
ally relevent inputs on some standard machine model (typically, a multi-tape 
deterministic Turing machine). We know that both input string length and 
grammar size and structure affect the complexity of the recognition problem. 
Hence, excluding either input from complexity consideration would not advance 
our understanding. 9 

Linguistics and computer science are primarily interested in the universal 
recognition problem because both disciplines are concerned with the formal 
power of a family of grammars. Linguistic competence and performance must 



'This "consider all relevant inputs" methodology is universally assumed in the formal lan- 
guage and computational complexity literature. For example, Hopcraft and Ullman(1979:139) 
define the context-free grammar recognition problem as: "Given a CFG G = (V, T, P,S) and 
a string x in T*, is x in L{G)t" . Garey and Johnson(1979) is a standard reference work in 
the field of computational complexity. All 10 automata and language recognition problems 
covered in the book (pp. 265-271) are universal, i.e. of the form "Given an instance of a ma- 
chine/grammar and an input, does the machine/ grammar accept the input?" The complexity 
of these recognition problems is alwayt calculated in terms of grammar and input size. 



3 COMPLEXITY OF GPSG-RECOGNITION 12 



be considered in the larger context of efficient language acquisition, while com- 
putational considerations demand that the recognition problem be characterized 
in terms of both input string and grammar size. Excluding grammar size from 
complexity consideration in order to argue that the recognition problem for a 
family of grammars is tractable is akin to fixing the size of the chess board 
in order to argue that winning the game of chess is tractable: neither claim 
advances our scientific understanding of chess or natural language. 

3.2 GPSG-Recognition is Exp-Poly hard 

The following proof reduces instances of a polynomial-space bounded alternating 
Turing Machine computation to instances of GPSG-Recognition. 

Definition: 10 

An alternating Turing Machine is like a nondeterministic TM, except that 
some subset of its states will be referred to as universal states, and the remainder 
as existential states. A nondeterministic TM is an alternating TM with no 
universal states. 

A fc-tape alternating Turing Machine (ATM) is a 10-tuple M =< Q, S, T, $, #, k, a, qo, F, U >, 
where 



Q> 90) F = set of states, initial state, set of accepting states, respectively 

S, T = input, tape alphabets, ScT 

$, # = endmarker, blank symbol; $, # € T — S 

k = number of read-write tapes, A; > 1 

S C ((Jx T fc+1 ) X (Q X T k X {L, R} k+1 ) is the next move relation 

U = set of universal states, U C Q 

Q — U = set of existential states 

The TM has a read-only input tape, with the input w € S* written as $u>$, 
and the reading head initialized to the first symbol of w. The k work tapes are 
one-way infinite, and are initially blank. A configuration of the ATM consists 
of the state, head positions, and contents of the k + 1 tapes. One move of the 
TM consists of reading one symbol from the input tape and moving the heads 
left/right as allowed by S, along with a change of state of the machine. No 
move is allowed if the head on the input tape is not within the two end markers 
(inclusive). If C is a configuration of M, let Nextjv/(C) be the set of possible 
configurations after one move of M. We say a configuration is existential (resp. 



10 Taken from Chandra and Stockmeyer(1976), with the restriction that the work tapes are 
one-way infinite, instead of two-way infinite. 



3 COMPLEXITY OF GPSG-RECOGNITION 13 



universal, accepting) if the state of the TM in that configuration is an existential 
(resp. universal, accepting) state. 

The computation of an alternating TM M on an input w is a possibly infinite 
tree where the nodes correspond to TM configurations. Technically, each node 
is of the form (x,C) where x S {0,1, ...}*, and C is the configuration. The 
root is (e, Co) where Co is the initial configuration, and the sons of (x, C) are 
all elements (zt,C«) where Nextjtf(C) = {Co, Ci,...,Cfc}. The criterion for 
acceptance is as follows. Let N be the set of nodes in the computation tree 
of TM M on input w. We label the nodes of the tree either true or false 
according to the following criteria. A labeling L : N — ► { true, false } is said 
to be acceptable if 

1. if C is an accepting configuration 

L(x, C) = true 

2. if C is an existential configuration 

L{x, C) = \/ L{xi, C) 

ceNext M (c) 

3. if C is a universal configuration 

L{x, C)= f\ L(xi, C) 

ceNext M (c) 

By convention, V (resp. /\) of an empty set is false (resp. true). M accepts 
the input w if and only if L(e, Co) = true for all acceptable labelings L. 

Note that alternating TMs without universal states accept exactly as do the 
corresponding nondeterministic TMs. 

Theorem 3 GPSG-Recognition is Exp-Poly time-hard 



Proof: By direct simulation of a polynomial space bounded alternating Turing 
Machine M on input tu. 11 



11 Without loss of generality, we use a 1-tape ATM, so 

s c (q x r x r) x (q x r x {l, r} x {l, r}) 



3 COMPLEXITY OF GPSG-RECOGNITION 14 

Let S(n) be a polynomial in n. Then, on input M , a S(n) space-bounded one 
tape alternating Turing Machine (ATM), and string w, we construct a GPSG 
G in polynomial time such that to € L(M) iff $0w>ilw>22...w„n$n+l e L(G). 

By Chandra and Stockmeyer(1976), 

ASPAGE(S{n)) = \J DTIME(c s ^) 

o>0 

where ASPACE(S(n)) is the class of problems solvable in space S(n) on an 
ATM, and DTIME(F(n)) is the class of problems solvable in time F(n) on a 
deterministic Turing Machine. As a consequence of this result and our following 
proof, we have the immediate result that GPSG-Recognition is DTIME(c s ^)- 
hard, for all constants c, or Exp-Poly time-hard. 

The nodes of the ATM computation tree are represented by syntactic cate- 
gories in K° — one feature for every tape square, plus three features to encode 
the ATM tape head positions and the current state. The reduction is limited 
to specifying a polynomial number of features in polynomial time; since these 
features are used to encode the ATM tape, the reduction may only specify 
polynomial space bounded ATM computations. 

The ID rules encode the ATM Nextji/O relation, i.e. C —» Nextj^ (C) for 
a universal configuration C. The reduction constructs an ID rule for every 
combination of possible head position, machine state, and symbol on the scanned 
tape square. Principles of universal feature instantiation transfer the rest of the 
instantaneous description (i.e. contents of the tape) from mother to daughters 
in ID rules. 

Let NextAf(C) = {Co, C\, .. .,0*}. If C is a universal configuration, then 
we construct an ID rule of the form 



C->Co,C lt ...,C h (6) 

Otherwise, C is an existential configuration and we construct the k + 1 ID 
rules 



C — C,- Vt, 0<i<k (7) 

A universal ATM configuration is labeled accepting if and only if it has halted 
and accepted, or if all of its daughters are labeled accepting. We reproduce this 
with the ID rules in 6 (or 8), which will be admissible only if all subtrees rooted 
by the RHS nodes are also admissible. 

An existential ATM configuration is labeled accepting if and only if it has 
halted and accepted, or if one of its daughters is labeled accepting. We reproduce 



3 COMPLEXITY OF GPSG-RECOGNITION 



15 



this with the ID rules in 7 (or 9) , which will be admissible only if one subtree 
rooted by a RHS node is admissible. 

All features that represent tape squares are declared to be in the HEAD 
feature set, and all daughter categories in the constructed ID rules are head 
daughters, thus ensuring that the Head Feature Convention (HFC) will transfer 
the tape contents of the mother to the daughter(s), modulo the tape writing 
activity specified by the next move relation. 

Details. 



Let 



where 



ResultOA/(t, a, d) = 

{ [[HEADO i + 1], [t* a], [A 1]] 
\ [[HEADO i - 1], [i a], [A 1]] 



if d= R 
if d=L 



Resultl M (j, c,p,d) = 

j [[HEAD1 j + 1], [r y c][STATE p]] if d = R 

| [[HEAD1 j - 1], [Tj c][STATE p\] if d = L 

Trans M (g,a,6) = {{p,c,d 1 ,d 2 ) : {{q, a,b), (p, c,d u d 2 )) e 8} 



a is the read-only (R/O) tape symbol currently be- 
ing scanned 

b is the read-write (R/W) tape symbol currently be- 
ing scanned 

di is the R/O tape direction 

d 2 is the R/W tape direction 

The GPSG G contains: 



1. Feature definitions 

A category in K° represents a node of an ATM computation tree, where 
the features in Atom encode the ATM configuration. Labeling is per- 
formed by ID rules. 



3 COMPLEXITY OF GPSG-RECOGNITION 16 

(a) definition of F, Atom, A 

F = Atom = {STATE , HEADO , HEAD1 , A} 
U {t : < i < | u» | +1} 

U {r y :l</<S(|u,|)} 
A = QuEuT ;as defined earlier 

(b) definition of p° 

p°(A) = {1,2,3} 

p° (STATE ) = Q ; the ATM state set 
p°(HEADO ) = {j : 1 < j <\ w |} 
p°(HEADl ) = {t:l<t'<5(|tu|)} 
V/6{t:0<t'<|to|+l} 

p°\f) = S U {$} ; the ATM input alphabet 

V/€{r y :l</<S(M)} 

p°(f) = T ; the ATM tape alphabet 

(c) definition of HEAD feature set 

HEAD = {t : < t <|w| +1} U {r y : 1 < j < S(\w\)} 

(d) FCRs to ensure full specification of all categories except null ones. 

V/, / € Atom, [STATE ] D [/] 

2. Grammatical rules 

Vi,j,q,a,b : 1 < i <\w\, 1< j < S(\w\), q€Q, a£S, beT 
if Transjv/(g,a, b) =£ 0, construct the following ID rules. 

(a) if q G U (universal state) 

{[HEADO t'],[t <z],[HEADl j\,[Tj 6], [STATE q], [A 1]} -> 
{ResultOM(«, a, di k ) U ResultlA/O', c k> Pk, d 2 k) ■ (8) 

(Pk, Cfc, d lk , d 2 k) € TranBM(g, a, b)} 

where all categories on the RHS are heads. 

(b) otherwise q G Q — U (existential state) 

V(pfc,Cfc,difc, d 2 k) € TransM^.a, b), 

{[HEADO t],[t a],[HEADl j\,[T 3 - b], [STATE q\,[A 1]} -► (9) 
ResultO M (t, o, dik) U Resultl M (y, c k , Pk, d 2k ) 

where all categories on the RHS are heads. 



3 COMPLEXITY OF GPSG-RECOGNITION 17 

(c) One ID rule to terminate accepting states, using null-transitions. 

{[STATE h],[l Y]}— e (10) 

(d) Two ID rules to read input strings and begin the ATM simulation. 
The A feature is used to separate functionally distinct components 
of the grammar. [A 1] categories participate in the direct ATM 
simulation, [A 2] categories are involved in reading the input string, 
and the [A 3] category connects the read input string with the ATM 
simulation start state. 

STARTS {[A 1]},{[A 2]} 
{[A2]}-{[A2]},{[A2]} 
where all daughters are head daughters, and where 

START = {[HEADO 1],[HEAD1 1], [STATE s],[A 3]} 
U {[IV #] : 1 < j < S(\u>\)} 

(e) the lexical rules, 

Vct.i (7€ E, 1 <i <\w\, 

< at, {[A 2], [i a]} > 

(12) 

v* o<»<H+i, 

<$i,{[A 2},[i $]}> 

The reduction plainly may be performed in polynomial time in the size of 
the simulated ATM, by inspection. 

No metarules or LP statements are needed, although metarules could have 
been used instead of the Head Feature Convention. Both devices are capable of 
transferring the contents of the ATM tape from the mother to the daughter(s). 
One metarule would be needed for each tape square/tape symbol combination 
in the ATM. 

GKPS Definition 5.14 of Admissibility guarantees that admissible trees must 
be terminated. 12 By the construction above — see especially the ID rule 10 — 



2 The admissibility of nonlocal trees is defined as follows (GKPS, p.104): 

Definition: Admissibility 

Let R be a set of ID rules. Then a tree t is admissible from R if and only if 

1. t is terminated, and 

2. every local subtree in t is either terminated or locally admissible from some 
rER. 



4 INTERPRETING THE RESULT 18 



an [A 1] node can be terminated only if it is an accepting configuration (i.e. 
it has halted and printed Y on its first square) . This means the only admissible 
trees are accepting ones whose yield is the input string followed by a very long 
empty string. Q .CD 



3.3 Sources of Intractability 

The two sources of intractability in GPSG theory spotlighted by this reduction 
are null-transitions in ID rules (see the ID rule 10 above), and universal feature 
instantiation (in this case, the Head Feature Convention). 

Grammars with unrestricted null-transitions can assign elaborate phrase 
structure to the empty string, which is linguistically undesirable and compu- 
tationally costly. The reduction must construct a GPSG G and input string x 
in polynomial time such that x € L(G) iff w € L(M), where M is a PSPACE- 
bounded ATM with input w. The 'polynomial time' constraint prevents us 
from making either x or G too big. Null-transitions allow the grammar to 
simulate the PSPACE ATM computation (and an Exp-Poly TM computation 
indirectly) with an enormously long derivation string and then erase the string. 
If the GPSG G were unable to erase the derivation string, G would only accept 
strings which were exponentially larger than M and to, i.e. too big to write 
down in polynomial time. 

The Head Feature Condition transfers HEAD feature values from the mother 
to the head daughters just in case they don't conflict. In the reduction we use 
HEAD features to encode the ATM tape, and thereby use the HFC to transfer 
the tape contents from one ATM configuration C (represented by the mother) to 
its immediate successors Co, ...,C n (the head daughters). The configurations 
C,Co,...,C n have identical tapes, with the critical exception of one tape square. 
If the HFC enforced absolute agreement between the HEAD features of the 
mother and head daughters, we would be unable to simulate the PSPACE ATM 
computation in this manner. 



4 Interpreting the Result 

4.1 Generative Power and Computational Complexity 

At first glance, a proof that GPSG-Recognition is Exp-Poly hard appears to 
contradict the fact that context-free languages can be recognized in 0(n 3 ) time 
by a wide range of algorithms. To see why there is no contradiction, we must first 



4 INTERPRETING THE RESULT 19 



explicitly state the argument from weak context-free generative power, which 
we dub the efficient parsability (EP) argument. 

The EP argument states that any GPSG can be converted into a weakly 
equivalent context-free grammar (CFG), and that CFG-Recognition is polyno- 
mial time; therefore, GPSG-Recognition must also be polynomial time. The 
EP argument continues: if the conversion is fast, then GPSG-Recognition is 
fast, but even if the conversion is slow, recognition using the "compiled" CFG 
will still be fast, and we may justifiably lose interest in recognition using the 
original, slow, GPSG. 

The EP argument is misleading because it ignores both the effect conversion 
has on grammar size, and the effect grammar size has on recognition speed. Cru- 
cially, grammar size affects recognition time in all known algorithms, and the 
only grammars directly usable by context-free parsers, Le. with the same com- 
plexity as a CFG, are those composed of context-free productions with atomic 
nonterminal symbols. For GPSG, this is the set of admissible local trees, and 
this set is astronomical: 



Q((3 mUn """) (13) 



in a GPSG G of size m. 13 



Context-free parsers like the Earley algorithm run in time 0{\ G' \ 2 n 3 ) 
where \G'\ is the size of the CFG G' and n the input string length, so a GPSG 
G of size m will be recognized in time 

0(S 2mim "" +1 ■ n 3 ) (14) 

The hyper-exponential term will dominate the Earley algorithm complexity 
in the reduction above because m is a function of the size of the ATM we are 



13 As we saw above, the metarule finite closure operation can increase the ID rule grammar 
size from | JR|= 0(\ G |) to 0(m 2m ) in a GPSG G of size m. We ignore the effects of ID/LP 
format on the number of admissible local trees here, and note that if we expanded out all 
admissible linear precedence possibilities in FG(M,R), the resultant 'ordered' ID rule grammar 
would be of size 0(m 2 "M). In the worst case, every symbol in FC(M,R) is underspecified, and 
every category in K extends every symbol in the FC(M,R) grammar. Since there are 

0(3 mmt ) 

possible syntactic categories, and 0(m 2m ) symbols in FC(M,R), the number of admissible 
local trees (= atomic context-free productions) in G is 

0((3 mml ) m3m ) = 0(3 mln,:lm+1 ) 

i.e. astronomical. Ristad(1986) argues that the minimal set of admissible local trees in GKPS' 
GPSG for English is considerably smaller, yet still contains more than 10 33 local trees. 



4 INTERPRETING THE RESULT 20 



simulating. Even if the GPSG is held constant, the stunning derived grammar 
size in formula 13 turns up as an equally stunning 'constant' multiplicative 
factor in 14, which in turn will dominate the real-world performance of the 
Earley algorithm for all expected inputs (i.e. any that can be written down in 
the universe), every time we use the derived grammar. 14 

Pullum(1985) has suggested that "examination of a suitable 'typical' GPSG 
description reveals a ratio of only 4 to 1 between expanded and unexpanded 
grammar statements," strongly implying that GPSG is efficiently processable as 
a consequence. 15 But this "expanded grammar" is not adequately expanded, i.e. 
it is not composed of context-free productions with unanalyzable nonterminal 
symbols. 16 These informal tractability arguments are a particular instance of 
the more general EP argument and are equally misleading. 

The preceding discussion of how intractability arises when converting a 
GPSG into a weakly equivalent CFG does not in principle preclude the ex- 
istence of an efficient compilation step. If the compiled grammar is truly fast 
and assigns the same structural descriptions as the uncompiled GPSG, and it 
is possible to compile the GPSG in practice, then the complexity of the univer- 
sal recognition problem would not accurately reflect the real cost of parsing. 17 



14 The compiled grammar recognition problem is at least as intractable as the uncompiled 
one. Even worse, Barton(1985) shows how the grammar expansion increases both the space 
and time costs of recognition, when compared to the cost of using the grammar directly. 

15 This substantive argument does not appear to fit in with the GKPS goal of a purely formal 
investigation of linguistics: "The universalism [of natural language] is, ultimately, intended to 
be entirely embodied in the formal system, not expressed by statements made in it."GKPS(4). 
It is difficult to respond precisely to the claims made in Pullum(1985), since the abstract is 
(necessarily) brief and consists of assertions unsupported by factual documentation or clari- 
fying assumptions. 

16 "Expanded grammar" appears to refer to the output of metarule finite closure (i.e. ID 
rules), and this expanded grammar is tractable only if the grammar is directly usable by the 
Earley algorithm exactly as context-free productions are: all nonterminals in the context-free 
productions must be unanalyzable. But the categories and ID rules of the metarule finite 
closure grammar do not have this property. Nonterminals in GPSG are decomposable into 
a complex set of feature specifications and cannot be made atomic, in part because not all 
extensions of ID rule categories are legal. For example, the categories VP[+IHV, VFOBM PAS] 
and VP[+INV, VFORM FIN] are not legal extensions of VP in English, while VP[+IHV, +AUX, 
VFORM FIN] is. FCRs, FSDs, LP statements, and principles of universal feature instantiation 
— all of which contribute to GPSG's intractability — must all still apply to the rules of this 
expanded grammar. 

Even if we ignore the significant computational complexity introduced by the machinery 
mentioned in the previous paragraph (i.e. theory of syntactic features, FCRs, FSDs, ID/LP 
format, null-transitions, and metarules), GPSG will still not obtain an efficient parsabSity result 
This is because the Head Feature Convention alone ensures that the universal recognition 
problem for GPSGs will be NP-hard and likely to be intractable. Ristad(1986) contains 
a proof. This result should not be surprising, given that (1) principles of universal feature 
instantiation in current GPSG theory replace the metarules of earlier versions of GPSG theory, 
and (2) metarules are known to cause intractability in GPSG. 

17 The existence or nonexistence of efficient compilation functions does not affect either our 
scientific interest in the universal grammar recognition problem or the power and relevance of 



4 INTERPRETING THE RESULT 21 



But until such a suggestion is forthcoming, we must assume that it does not 
exist. 18 - 19 



4.2 Complexity and Succinctness 

The major complexity result of this paper proves that the fastest algorithm for 
GPSG-Recognition must take more than exponential time. The immediately 
preceding section demonstrates exactly how a particular algorithm for GPSG- 
Recognition (the EP argument) comes to grief: weak context-free generative 
power does not ensure efficient parsability because a GPSG G is weakly equiva- 
lent to a very large CFG G', and CFG size affects recognition time. The rebut- 
tal does not suggest that computational complexity arises from representational 
succinctness, either here or in general. 

Complexity results characterize the amount of resources needed to solve 
instances of a problem, while succinctness results measure the space reduction 
gained by one representation over another, equivalent, representation. 

There is no casual connection between computational complexity and repre- 
sentational succinctness, either in practice or principle. In practice, converting 
one grammar into a more succinct one can either increase or decrease the recog- 
nition cost. For example, converting an instance of context-free recognition 
(known to be polynomial time) into an instance of context-sensitive recognition 

a complexity analysis. If complexity theory classifies a problem as intractable, we learn that 
something more must be said to obtain tractability, and that any efficient compilation step, 
if it exists at all, must itself be costly. 

18 Note that the GPSG we constructed in the preceding reduction will actually accept 
any input x of length less than or equal to ] w | if and only if the ATM M accepts it us- 
ing 5(1 w |) space. We prepare an input string x for the GPSG by converting it to the 
string $0zilZ22 . . .X n n%n + 1 e.g. abadee is accepted by the ATM if and only if the string 
$0al62a3d4e5e6$7 is accepted by the GPSG. Trivial changes in the grammar allows us to 
permute and "spread" the characters of x across an infinite class of strings in an unbounded 
number of ways, e.g. tOfiXiiii ■ ■ -laXillb ■ ■ -ln^n+1 where each -ft is a string over an 
alphabet which is distinct from the ai alphabet. Although the flexibility of this construction 
results in a more complicated GPSG, it argues powerfully against the existence of any efficient 
compilation procedure for GPSGs. Any efficient compilation procedure must perform more 
than an exponential polynomial amount of work (GPSG-Recognition takes at least Exp-Poly 
time) on at least an exponential number of inputs (all inputs that fit in the | w | space of the 
ATM's read-only tape). More importantly, the required compilation procedure will convert 
any exponential-polynomial time bounded Turing Machine into a polynomial-time TM for the 
class inputs whose membership can be determined within a arbitrary (fixed) exp-poly time 
bound. Simply listing the accepted inputs will not work because both the GPSG and TM may 
accept an infinite class of inputs. Such a compilation procedure would be extremely powerful. 

"Note that compilation illegitimately assumes that the compilation step is free. There is 
one theory of primitive language learning and use: conjecture a grammar and use it. For this 
procedure to work, grammars should be easy to test on small inputs. The overall complexity 
of learning, testing, and speech must be considered. Compilation speeds up the speech com- 
ponent at the expense of greater complexity in the other two components. For this linguistic 
reason the compilation argument is suspect. 



4 INTERPRETING THE RESULT 22 



(known to be PSPACE-complete and likely to be intractable) can significantly 
speed the recognition problem if the conversion decreases the size of the CFG 
logarithmically or better. Even more strangely, increasing ambiguity in a CFG 
can speed recognition time if the succinctness gain is large enough, or slow it 
down otherwise — unambiguous CFGs can be recognized in linear time, while 
ambiguous ones require cubic time. 

In principle, tractable problems may involve succinct representations. For 
example, the iterating coordination schema (ICS) of GPSG is an unbeatably 
succinct encoding of an infinite set of context-free rules; from a computational 
complexity viewpoint, the ICS is utterly trivial using a slightly modified Earley 
algorithm. 20 Tractable problems may also be verbosely represented: consider a 
random finite language, which may be recognized in essentially constant time on 
a typical computer (using a hash table), yet whose elements must be individually 
listed. Similarly, intractable problems may be represented both succinctly and 
nonsuccinctly. As is well known, the Turing machine for any arbitrary r.e. set 
may be either extremely small or monstrously big. Winning the game of chess 
when played on an n X n board is likely to be computationally intractable, yet 
the chess board is not intended to be an encoding of another representation, 
succinct or otherwise. 

Tractable problems may involve succinct or nonsuccinct representations, as 
may intractable problems. The reductions in this paper show that GPSGs 
are not merely succinct encodings of some context-free grammars; they are 
inherently complex grammars for some context-free languages. The heart of 
the matter is that GPSG's formal devices are computationally complex and can 
encode provably intractable problems. 

4.3 Relevance of the Result 

In this paper, we argued that there is nothing in the GPSG formal framework 
that guarantees computational tractability: proponents of GPSG must look else- 
where for an explanation of efficient parsability, if one is to be given at all. 
The crux of the matter is that the complex components of GPSG theory in- 
teract in intractable ways, and that weak context-free generative power does 
not guarantee tractability when grammar size is taken into account. A faithful 
implementation of the GPSG formalisms of GKPS will provably be intractable; 
expectations computational linguistics might have held in this regard are not 
fulfilled by current GPSG theory. 

This formal property of GPSGs is straightforwardly interesting to GPSG 



20 A more extreme example of the unrelatedness of succinctness and complexity is the ab- 
solute succinctness with which the dense language E* may be represented — whether by a 
regular expression, CFG, or even Turing machine — yet members of E* may be recognized 
in constant time (i.e. always accept). 



5 REFERENCES 23 



linguists. As outlined by GKPS, "an important goal of the GPSG approach to 
linguistics [is] the construction of theories of the structure of sentences under 
which significant properties of grammars and languages fall out as theorems as 
opposed to being stipulated as axioms (p.4)." 

The role of a computational analysis of the sort provided here is fundamen- 
tally positive: it can offer significant formal insights into linguistic theory and 
human language, and suggest improvements in linguistic theory and real-world 
parsers. The insights gained may be used to revise the linguistic theory so that 
it is both stronger linguistically and weaker formally. Work on revising GPSG is 
in progress. Briefly, some proposed changes suggested by the preceding reduc- 
tions are: unit feature closure, no FCRs or FSDs, no null-transitions in ID rules, 
metarule unit closure, and no problematic feature specifications in the princi- 
ples of universal feature instantiation. Not only do these restrictions alleviate 
most of GPSG's computational intractability, but they increase the theory's 
linguistic constraint and reduce the number of nonnatural language grammars 
licensed by the theory. Unfortunately, there is insufficient space to discuss these 
proposed revisions here — the reader is referred to Ristad(1986) for a complete 
discussion. 



Acknowledgments. Robert Berwick, Jim Higginbotham, and Richard Lar- 
son greatly assisted the author in writing this paper. The author is also indebted 
to Sandiway Fong, W. Eric Grimson, and David Waltz for their help, and to 
the MIT Artificial Intelligence Lab and Thinking Machines Corporation for sup- 
porting this research. 



5 References 



Barton, G.E. (1985). "On the Complexity of ID/LP Parsing," Computational 
Linguistics, 11(4): 205-218. 

Chandra, A. and L. Stockmeyer (1976). "Alternation," 17 th Annual Symposium 
on Foundations of Computer Science,: 98-108. 

Gazdar, G. (1981). "Unbounded Dependencies and Coordinate Structure," Lin- 
guistic Inquiry 12: 155-184. 

Gazdar, G., E. Klein, G. Pullum, and I. Sag (1985). Generalized Phrase Struc- 
ture Grammar. Oxford, England: Basil Blackwell. 

Garey, M, and D. Johnson (1979). Computers and Intractability. San Francisco: 
W.H. Freeman and Co. 

Hopcroft, J.E., and J.D. Ullman (1979). Introduction to Automata Theory, 
Languages, and Computation. Reading, MA: Addison-Wesley. 



5 REFERENCES 24 



Pullum, G.K. (1985). "The Computational Tractability of GPSG," Abstracts 
of the 60th Annual Meeting of the Linguistics Society of America, Seattle, 
WA: 36. 

Ristad, E.S. (1985). "GPSG-Recognition is NP-hard," A.I. Memo No. 837, 
Cambridge, MA: M.I.T. Artificial Intelligence Laboratory. 

Ristad, E.S. (1986). "Complexity of Linguistic Models: A Computational Anal- 
ysis and Reconstruction of Generalized Phrase Structure Grammar," S.M. 
Thesis, MIT Department of Electrical Engineering and Computer Science. 
(In progress). 



CS-TR Scanning Project 

Document Control Form Date : _tyj£j3L 

Report #_£l£Ql£Ii— 

Each of the following should be identified by a checkmark: 
Originating Department: 

J^ Artificial Intellegence Laboratory (Al) 

□ Laboratory for Computer Science (LCS) 

Document Type: 

□ Technical Report (TR) J^ Technical Memo (TM) 

□ Other: 

Document Information Number of pages: ^(yn^^^sj 

- Not to include DOD forms, printer Wstruetions, etc... original pages only. 

Originals are: Intended to be printed as : 

□ Single-sided or D Single-sided or 

□ Double-sided E Double-sided 
Print type: 

□ Typewriter □ Offset Press Jgf Laser Print 

□ InkJet Printer □ Unknown □ Other 

Check each if included with document: 

^E^ DOD Form (•*-) □ Funding Agent Form □ Cover Page 

□ Spine □ Printers Notes □ Photo negatives 

□ Other: 

Page Data: 

Blank Pages^p*. numb*): 



Photographs/Tonal Material to**™****. 

Other (natodMCfiptfenfeagsiuiintMf): 

Description : Page Number. 



Description : rage NumDer. 

tCroA^ ™? i (I - ^5 ) g/vfr Vt) JTTtM /P/QG£> / -^ 

fa- 7 f )Sc*^7h*L ) MDCi-\71&T J f C?) 



Scanning Agent Signoff: 

Date Received: hl^l c ts Date Scanned: Hill HS Date Returned: H I ^ HS 



Scanning Agent Signature: /Vu^J\»uJ| JA ) ><£fmi 



V — R W g/g4DS/LCSDocwmM Control FomaMoon.vsd 



UNCLASSIFIED 



5ECJOITV Classification O r This baOE ft»>>»n D.r. Enlmrod) 



REPORT DOCUMENTATION PAGE 


READ INSTRUCTIONS 
BEFORE COMPLETING FORM 


1 REPOR T NUMBER 

AIM 894 


2. GOVT ACCESSION NO. 


1. RECIPIENT'S CATALOG NUMBER 


* TITLE (and Subtitle) 

Computational Complexity of Current GPSG Theory 


S. TYPE OF REPORT a PERIOD COVERED 

AI-memo 


«. PERFORMING ORG. REPORT NUMBER 


7. authORo; 

Eric Sven Ristad 


1. CONTRACT OR GRANT NUMBERft; 

N00014-80-C-0505 


9. PERFORMING ORGANIZATION NAME ANO ADDRESS 

Artificial Inteligence Laboratory 
545 Technology Square 
Cambridge, MA 02139 


10. PROGRAM ELEMENT. PROJECT, TASK 
AREA » WORK UNIT NUMBERS 


II. CONTROLLING OFFICE NAME ANO AODRESS 

Advanced Research Projects Agency 
1400 Wilson Blvd. 
Arlington, VA 22209 


12. REPORT DATE 

April, 1986 


13. NUMBER OF PAGES 

24 pages 


U MONITORING AGENCY NAME o AOORESSf// dlllotonl from Controlling Olllco) 

Office of Naval Research 
Information Systems 
Arlington, VA 22217 


IS. SECURITY CLASS, (ot thlo top H) 

UNCLASSIFIED 


ISa. DECLASSIFICATION/ DOWNGRADING 
SCHEDULE 


16. DISTRIBUTION STATEMENT (ol Ihlm Hopotl) 

Distribution is unlimited. 


17. DISTRIBUTION STATEMENT (ol tHo abmttact ontotod In Block 10, II dlitoront horn Umpott) 


1». SUPPLEMENTARY NOTES 

None 


IS. KEY WORDS (Contlnuo on torotoo old* II nocoomory and Idontily by block numbot) 

GPSG, recognition problem, complexity, linguistic models, parsing, linguistics, 
intractability . 


20. ABSTRACT (Contlnuo on torotoo tldo II nocoooary and Idontily by block numbor) 

(see reverse side) 



DD | jan M 73 1^73 EDITION OF I NOV SS IS OBSOLETE 

S/N 0103-014- 6601 I 



UNCLASSIFIED 



SECURITY CLASSIFICATION OF THIS PAGE (Whon Dmto Bnlort 



An important goal of computational linguistics has been to use 
linguistic theory to guide the construction of computationally efficient 
real-world natural language processing systems. At first glance, 
generalized phrase structure grammar (GPSG) appears to be a blessing on 
two counts. First, the precise formalisms of GPSG might be a direct and 
transparent guide for parser design and implementation. Second, since 
GPSG has weak context-free generative power and context-free languages 
can be parsed in 0(n**3) by a wide range of algorithms, GPSG parsers 
would appear to run in polynomial time. This widely-assumed GPSG 
^efficient parsability" result is misleading: here we prove that the 
universal recognition problem for current GPSG theory is 
exponential-polynomial time hard, and assuredly intractable. The paper 
pinpoints sources of complexity (e.g. metarules and the theory of 
syntactic features) in the current GPSG theory and concludes with some 
linguistically and computationally motivated restrictions on GPSG. 



Scanning Agent Identification Target 



Scanning of this document was supported in part by 
the Corporation for National Research Initiatives, 
using funds from the Advanced Research Projects 
Agency of the United states Government under 
Grant: MDA972-92-J1029. 



The scanning agent for this project was the 
Document Services department of the M.I.T 
Libraries. Technical support for this project was 
also provided by the M.I.T. Laboratory for 
Computer Sciences. 



Scanned 
Date: tl/o^fh^^s 



M.I.T. Libraries 
Document Services 



darptrgt.wpw Rev. 9/94